Trees and Nets


Kerry Back

BUSI 520, Fall 2022
JGSB, Rice University

Trees

Decision tree

  • Split sample sucessively into smaller subsamples by answering “yes-no” questions.
  • Each question is based on a single variable: is it above a threshold?
  • Split a specified number of times (depth). Final subsamples are called leaves.
  • The prediction for each observation is the mean of the training observations that end up in the same leaf.
  • Variable to split on and threshold are chosen each time to minimize the SSE after the split.

Illustration

Random forest

  • Multiple trees fit to random data
  • Data for each tree is a bootstrapped sample:
    • random selection of rows (with replacement)
    • same size as original sample
  • Prediction is average of predictions of the trees
  • Hyperparameters = number of trees and depth of trees

Gradient boosting

  • Multiple trees
  • First tree fit to data
  • Second tree fit to errors from first tree
  • Third tree fit to errors from second tree, …
  • Prediction is sum of predictions
  • Hyperparameters = number of trees and depth of trees

Neural networks

Multi-layer perceptrons

  • A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
  • A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
  • The inputs in the first layer are the predictors.
  • The inputs in successive layers are the calculations from the prior level.
  • The last layer is a single neuron that produces the output.

Illustration

  • 4 independent variables (features)
  • 5 functions of the 4 features are calculated in the “hidden layer.”
  • The output is a function of the 5 numbers calculated in the hidden layer.

Rectified linear units

  • The usual function for the neurons (except in the last layer) is \[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\] Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
  • This function is called a rectified linear unit (RLU). It’s like an option payoff.
  • Last layer uses a linear function \[ y = b+w_1x_1 + \cdots + w_nx_n\]

Analogy to neurons firing


If \(w_i>0\) and \(b<0\), then \(y>0\) only when \(x_i\) are large enough.


A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).

Deep learning

  • Deep learning means a neural network with many layers.
  • Deep learning is behind facial recognition, self-driving cars, …
  • Need specialized library, probably TensorFlow (from Google) or PyTorch (from Facebook)
  • And probably need a graphical processing unit (GPU) – i.e., run on a video card
  • Can often start from a pretrained model